1. The introduction about MetaCycle

1.1. General description about MetaCycle

The MetaCycle package is used to detect rhythmic signals in large scale time-series data. It uses three algorithms, ARSER (ARS), JTK_CYCLE (JTK), and Lomb-Scargle (LS), to detect periodic signals and can output integrated analysis results.

1.2. Types of time-series data

Usually, circadian time-series data is evenly sampled and the interval between time points is an integer (e.g. 4 hrs). However, sometimes there are replicate samples, missing values or un-evenly sampled series, or sampling with a non-integer interval. Examples of these types of data are shown in this table.

Data Type Point 1 Point 2 Point 3 Point 4 Point 5 Point 6
The usual data CT0 CT4 CT8 CT12 CT16 CT20
With missing value CT0 NA CT8 CT12 CT16 CT20
With replicates CT0 CT0 CT8 CT8 CT16 CT16
With un-even interval CT0 CT2 CT8 CT10 CT16 CT20
With non-integer interval CT0 CT4.5 CT9 CT13.5 CT18 CT22.5

Some datasets can have one or more of these conditions, e.g.

Data Type Point 1 Point 2 Point 3 Point 4 Point 5 Point 6
With replicates and missing value CT0 CT0 CT8 NA CT16 CT16
With un-even interval and replicates CT0 CT2 CT2 CT10 CT16 CT20

1.3. Method selection

The meta2d function in MetaCycle is designed to analyze different types of time-series datasets and can automatically select the proper methods to analyze different experimental designs. The implementation strategy used for meta2d is shown in the flow chart.

1.4. Integration strategies

In addition to selecting the proper method(s) to analyze different experimental designs, meta2d can also integrate results from multiple methods. A detailed explanation of integration strategies is in the vignettes of MetaCycle.

  • Pvalue
  • Period and phase
    • The integrated period is an arithmetic mean value of multiple periods, while phase integration is based on the mean of circular quantities.
  • Amplitude calculation
    • meta2d calculates the amplitude with following model: \[Y_i = B + TRE*(t_i - \frac{\sum_{i=1}^n t_i}{n}) + A*cos(2*\pi*\frac{t_i - PHA}{PER})\]

    • B is the baseline level of the time-series data; TRE is its trend; A is the amplitude of the waveform. PHA and PER are integrated period and phase from above.
    • The baseline and trend are explained below.

  • About relative amplitude (rAMP)
    • meta2d outputs a relative amplitude value (rAMP), which normalizes amplitude to overall expression levels. For example, Ugt2b34 has a larger amplitude than Arntl, but its rAMP is smaller than Arntl.

1.5. Description about output values

Column name Description Column name Description
ARS_pvalue p-value from ARS LS_BH.Q FDR from LS
ARS_BH.Q FDR from ARS LS_period period from LS
ARS_period period from ARS LS_adjphase adjusted phase from LS
ARS_adjphase adjusted phase from ARS LS_amplitude amplitude from LS
ARS_amplitude amplitude from ARS meta2d_pvalue integrated pvalue
JTK_pvalue p-value from JTK meta2d_BH.Q FDR based on integrated pvalue
JTK_BH.Q FDR from JTK meta2d_period averaged period
JTK_period period from JTK meta2d_phase integrated phase
JTK_adjphase adjusted phase from JTK meta2d_Base baseline value given by meta2d
JTK_amplitude amplitude from JTK meta2d_AMP amplitude given by meta2d
LS_pvalue p-value from LS meta2d_rAMP relative amplitude

2. Install and check software and packages

# load 'shiny' package
library("shiny")
# load 'MetaCycle' package
library("MetaCycle")
# load 'dplyr' package
library("dplyr")
# load 'ggplot2' package
library("ggplot2")
# load 'cowplot' package
library("cowplot")
# change working directory to the desktop 
setwd("~/Desktop")

# check required directories under the desktop
dir()

3. The MetaCycle demo

3.1. Open the meta2d web application

  • Option 1: use the ‘runGitHub’ function. Type the command below in the RStudio ‘Console’ (requires Internet connection)
# load shiny package
library("shiny")

# use 'runGitHub' function of shiny package
runGitHub("MetaCycleApp", "gangwug")
  • Option 2: use the ‘runApp’ function. Type the command below in the RStudio ‘Console’
# load shiny package
library("shiny")

# use 'runApp' function of shiny package
runApp("~/Desktop/MetaCycleApp-master")
  • IMPORTANT NOTE: Windows users need to click the ‘Open in Browser’ button to download analyses to their computer.
  • To terminate the web application, click the red ‘stop’ button on top right of the RStudio ‘Console’.
  • If the web app is running, it will show:

3.2. Use the meta2d web app to analyze three use cases

  • case A (csv): 1) evenly sampled, 4h interval over 2 days (from CT18 to CT62) with no replicates or missing values

    • goal: use JTK_CYCLE to detect circadian transcripts
    • step 1: click ‘Choose File’ button to upload case A dataset from ‘data’ directory
    • step 2: set the parameters as shown below (set ‘minper’ and ‘maxper’ to 24 and select ‘JTK’ from ‘cycMethod’) and click ‘Run’ button
    • step 3: check the analysis results
    • step 4 (Mac users): click ‘download’ button to download the output to the Desktop and name the output file “meta2d_caseA.csv”
    • step 4 (Windows users): Change the default download folder to Desktop and follow the instructions for Chrome or Firefox. Click ‘download’ button to download the output. There will be a file named ‘meta2d.csv’ in the download folder. Manually re-name this file ‘meta2d_caseA.csv’.
  • case B (txt): 1) evenly sampled, 6h interval over two days (from CT20 to CT62), no replicates, but 1 missing time point (CT50)

    • goal: use JTK_CYCLE and Lomb-Scargle to detect circadian transcripts
    • steps: 1) as above, upload case B dataset from ‘data’ directory, 2) parameters setting: select ‘txt’ under fileStyle, set ‘minper’ and ‘maxper’ to 24, and select ‘JTK’ and ‘LS’ under ‘cycMethod’, 3) click ‘Run’ button, 4) click ‘Download’ button to save output (name it as ‘meta2d_caseB.txt’), 5) you should see a window like the below during the analysis
  • case C (csv): 1) cell cycle data from yeast, 2) evenly sampled, 16 min interval from 2 min to 162 min, 3) no replicates or missing values, 4) the expected period is 85 min

    • goal: use ARSER, JTK_CYCLE, and Lomb-Scargle to detect cycling transcripts in the yeast cell cycle
    • steps: 1) upload case C dataset from ‘data’ directory as above, 2) parameter settings: select ‘csv’ under fileStyle, set ‘minper’ as 80 and ‘maxper’ as 96, set ‘ARSdefaultPer’ as 85, and select all three methods under ‘cycMethod’, 3) click ‘Run’ button, 4) click ‘Download’ button to save output (name it as ‘meta2d_caseC.csv’), 5) you could see a window like below during analyzing caseC dataset

3.3. Homework: Practice MetaCycleApp. There are five exercise datasets (file names containing ‘exercise’) in ‘data’ directory, please analyze exerciseA and exerciseB within this workshop, and do the remaining after this workshop.

  • exerciseA (txt): 1) evenly sampled with 4h interval covering one day (from CT19 to CT39), 2) no replicates or missing value
  • exerciseB (csv): 1) evenly sampled with 4h interval covering two days, 2) three replicates at each time point, 3) no missing value
  • exerciseC (csv): 1) evenly sampled with 4h interval covering two days, 2) varied number of replicates at each time point, 3) no missing value
  • exerciseD (csv): 1) evenly sampled with 3h interval covering two days (from CT0 to CT45), 2) three replicates at each time point 3) there is random missing value in some rows
  • exerciseE (csv): 1) un-evenly sampled from CT19 to CT57, 2) no replicates or missing value
  • general advice: 1) select all three methods under ‘cycMethod’ and MetaCycleApp will automaticlly select methods that could be used to analyze input dataset, 2) to search circadian profiles, it is better to set both ‘minper’ and ‘maxper’ as 24, 3) select ‘csv’ or ‘txt’ under ‘fileStyle’ according to input file, 4) name outputs as ’meta2d_exercise*’

3.4. Analyze experimental data (this may take several minutes)

  • experimentA: 1) evenly sampled with 2h interval covering two days (from CT18 to CT64), 2) no replicates or missing values, 3) sampling time information is numeric value in the header, 4) 15K probesets (about 1/3 of total probesets for MOE4302 array)
  • general advice: 1) select ‘JTK’ and ‘LS’ in analyzing high-resolution datasets (for 2h over 2 days, ‘ARS’ has a higher false positive rate), 2) set ‘minper’ and ‘maxper’ both as ‘24’ for searching circadian transcripts, 3) save output on Desktop and name it as ‘meta2d_experimentA.csv’, 4) you should see the following while analyzing this dataset

3.5. John Hogenesch will lead a short discussion while MetaCycleApp’ runs

  • dicussion A: Why is 2 days better than 1?

    • datasets: ‘dicussionA_1day.csv’ and ‘dicussionA_2days.csv’ are in the ‘data’ directory, which are re-sampled (5000 probesets, 1h / 1day or 1h / 2days) from ‘Hughes2009_MouseLiver1h.txt’
    • MetaCycleApp parameters: minper=maxeper=24, cycMethod=c(“JTK”, “LS”)
    • outputs: ‘meta2d_dicussionA_1day.csv’ and ‘meta2d_dicussionA_2days.csv’ are in ‘result’ directory
  • dicussion B: Why isn’t there better overlap?

    • datasets: ‘dicussionB_CT18.csv’ and ‘dicussionB_CT19.csv’ are in ‘data’ directory, which are re-sampled (5000 probesets, 2h / 2days begins at CT18 or CT19) from ‘Hughes2009_MouseLiver1h.txt’
    • MetaCycleApp parameters: minper=maxeper=24, cycMethod=c(“JTK”, “LS”)
    • outputs: ‘meta2d_dicussionB_CT18.csv’ and ‘meta2d_dicussionB_CT19.csv’ are in ‘result’ directory

3.6. Check the expression profiles of circadian transcripts (FDR < 0.05) and show their phase distribution

  • Type below command in the RStudio ‘Console’ to prepare a dataframe (‘figureD’) used for plotting figures

# copy the 'meta2d_experimentA.csv' file from Desktop to 'result' directory in SRBR_SMTSAworkshop-master
file.copy(from = "~/Desktop/meta2d_experimentA.csv", 
           to = "~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv", overwrite = TRUE)
# load dplyr package
library("dplyr")

# read in the 'meta2d_experientA.csv' file
dataD <- read.csv("~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv", stringsAsFactors = FALSE)
# take a look at column names of the data
colnames(dataD)
##  [1] "CycID"         "JTK_pvalue"    "JTK_BH.Q"      "JTK_period"   
##  [5] "JTK_adjphase"  "JTK_amplitude" "LS_pvalue"     "LS_BH.Q"      
##  [9] "LS_period"     "LS_adjphase"   "LS_amplitude"  "meta2d_pvalue"
## [13] "meta2d_BH.Q"   "meta2d_period" "meta2d_phase"  "meta2d_Base"  
## [17] "meta2d_AMP"    "meta2d_rAMP"   "X18"           "X20"          
## [21] "X22"           "X24"           "X26"           "X28"          
## [25] "X30"           "X32"           "X34"           "X36"          
## [29] "X38"           "X40"           "X42"           "X44"          
## [33] "X46"           "X48"           "X50"           "X52"          
## [37] "X54"           "X56"           "X58"           "X60"          
## [41] "X62"           "X64"
# filter the data by meta2d_BH.Q < 0.05 by 'filter' function of dplyr
cirD <- filter(dataD, meta2d_BH.Q < 0.05)
# see the number of circadian transcripts
nrow(cirD)
## [1] 706
# select "CycID", "meta2d_period", "meta2d_phase", and "X18" to "X64" columns for drawing heatmap by 'select' function of dplyr package
figureD <- select(cirD, CycID, meta2d_period, meta2d_phase, num_range("X", seq(18, 64, by=2), width = 2))
  • Type the below command in the ‘Console’ to draw a heatmap of circadian transcripts (if the code runs well, a heatmap figure will show in the bottom right window)

# load 'heatmapF' and 'phaseHist' function used to draw heatmap and histogram
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")

# get the heatmap figure by 'heatmapF' function
heatmapFigure <- heatmapF(inputD = figureD, minfold=0.5, maxfold=2)

heatmapFigure

  • Type the below command in the ‘Console’ to draw a histogram of the phase distribution of circadian transcripts (if the code runs, a histogram will show in the bottom right window)

# get the phase distribtuion figure by 'phaseHist' function
histFigure <- phaseHist(inputD = figureD, binvalue=seq(0,24,by=2), histcol = "blue")

histFigure

4. The PSEA demo

Phase Set Enrichment Analysis (PSEA) is designed to incorporate prior gene set knowledge in the analysis of periodic data. PSEA can identify gene sets showing temporally coordinated expression. For more info on PSEA, please see Ray Zhang’s PSEA paper.

  • Try to open a ‘jar’ file

    • Go to “PSEA-master” directory, and could see ‘PSEA1.1_VectorGraphics.jar’
    • Right click this file and click ‘open’. On mac computer, it may show below dialog, and just click ‘Open’.
    • If these this ‘jar’ file is successfully opened, you will see the below dialog (If not, please follow README file to install latest Java).

4.1. Prepare the file used for PSEA analysis

  • Type below command in the ‘Console’ window of RStudio to prepare the file used to PSEA analysis

# take a look at column names of 'cirD' dataframe
colnames(cirD)
##  [1] "CycID"         "JTK_pvalue"    "JTK_BH.Q"      "JTK_period"   
##  [5] "JTK_adjphase"  "JTK_amplitude" "LS_pvalue"     "LS_BH.Q"      
##  [9] "LS_period"     "LS_adjphase"   "LS_amplitude"  "meta2d_pvalue"
## [13] "meta2d_BH.Q"   "meta2d_period" "meta2d_phase"  "meta2d_Base"  
## [17] "meta2d_AMP"    "meta2d_rAMP"   "X18"           "X20"          
## [21] "X22"           "X24"           "X26"           "X28"          
## [25] "X30"           "X32"           "X34"           "X36"          
## [29] "X38"           "X40"           "X42"           "X44"          
## [33] "X46"           "X48"           "X50"           "X52"          
## [37] "X54"           "X56"           "X58"           "X60"          
## [41] "X62"           "X64"
# if it reports error after typing above command, please re-run the code of preparing 'figureD' in the first part of 3.6 section of this demo

# select its 'CycID', "meta2d_BH.Q", 'meta2d_phase' columns for later analysis
phaseD <- select(cirD, CycID, meta2d_BH.Q, meta2d_phase)

# read in the annotation file-'annoDbiMainInfMouse4302.txt' in the 'data-raw' directory
annoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/Mouse4302ProbeAnnotation.txt", stringsAsFactors = FALSE)

# add annotation information with 'inner_join' function of dplyr package
phaseD <- inner_join(phaseD, annoD, by=c("CycID" = "PROBEID"))
# take a look at the phaseD
head(phaseD)
##        CycID  meta2d_BH.Q meta2d_phase ENTREZID  SYMBOL
## 1 1416273_at 6.267205e-04    0.9112647    21928 Tnfaip2
## 2 1418853_at 1.005625e-04    2.2190313    28194    Apon
## 3 1418322_at 1.267709e-03   21.7744648    12916    Crem
## 4 1435748_at 5.230269e-08   16.5923801    14544     Gda
## 5 1448993_at 3.079038e-02   16.0442955    67841    Atg3
## 6 1428889_at 4.627240e-02    4.8168048    69113  Alkbh3
# filter out those probesets without annotation information
phaseD <- filter(phaseD, SYMBOL != "NA")

# select 'SYMBOL', 'meta2d_BH.Q' and 'meta2d_phase' columns for getting a dataframe without duplicate gene names
ori_phaseD <- select(phaseD, SYMBOL, meta2d_BH.Q, meta2d_phase)
# take a look at the row number with possilbe duplicate gene names
dim(ori_phaseD)
## [1] 698   3
# load the 'uniF' function from 'fig.R' for doing this work
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get a dataframe without duplicate gene names
uni_phaseD <- uniF(ori_phaseD)
# take a look at the rownumber now
dim(uni_phaseD)
## [1] 643   3
# select 'SYMBOL' and 'meta2d_phase' columns for PSEA analysis
pseaD <- select(uni_phaseD, SYMBOL, meta2d_phase)
# take a look at the pseaD
head(pseaD)
##          SYMBOL meta2d_phase
## 1 1110059E24Rik   15.0817948
## 2 1190002N15Rik   15.7701032
## 3 1700023H06Rik    6.0991388
## 4 1810055G02Rik   11.2992141
## 5 2310035C23Rik    1.6562627
## 6 2610507B11Rik    0.8431068
# write the 'pseaD' dataframe to a txt file-'experimentPSEA.txt' in 'result' directory
write.table(pseaD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt", sep = "\t", quote = FALSE, row.names = FALSE)
  • Check the ‘result’ directory and make sure there is a file named ‘experimentPSEA.txt’

4.2. Convert the mouse gene name to its human homolog gene name

  • Type the below command in the ‘Console’ window of RStudio to convert the mouse gene name to its human homolog gene name

# read in the 'experimentPSEA.txt' file
ori_pseaD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt", stringsAsFactors = FALSE)
# take a look at the data
head(ori_pseaD)
##          SYMBOL meta2d_phase
## 1 1110059E24Rik   15.0817948
## 2 1190002N15Rik   15.7701032
## 3 1700023H06Rik    6.0991388
## 4 1810055G02Rik   11.2992141
## 5 2310035C23Rik    1.6562627
## 6 2610507B11Rik    0.8431068
# read in the 'MouseHumanHomolog.txt' file in 'data-raw' directory of SRBR_SMTSAworkshop-master
homoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/MouseHumanHomolog.txt", stringsAsFactors = FALSE)
# take a look at the data
head(homoD)
##   Mouse_gene Human_gene
## 1      Acadm      ACADM
## 2     Acadvl     ACADVL
## 3      Acat1      ACAT1
## 4      Acvr1      ACVR1
## 5       Sgca       SGCA
## 6       Adsl       ADSL
# join mouse gene and human homolog gene with 'inner_join' function
homo_pseaD <- inner_join(ori_pseaD, homoD, by=c("SYMBOL" = "Mouse_gene"))
# take a look at the joined dataframe
head(homo_pseaD)
##          SYMBOL meta2d_phase Human_gene
## 1 1110059E24Rik   15.0817948    C9orf85
## 2 1190002N15Rik   15.7701032    C3orf58
## 3 1810055G02Rik   11.2992141   C11orf24
## 4 2310035C23Rik    1.6562627   KIAA1468
## 5 2610507B11Rik    0.8431068   KIAA0100
## 6 2700094K13Rik    9.9926117   C11orf31
# select "Human_gene" and "meta2d_phase" columns for PSEA analysis
outD <- select(homo_pseaD, Human_gene, meta2d_phase)
# take a look at the selected data
head(outD)
##   Human_gene meta2d_phase
## 1    C9orf85   15.0817948
## 2    C3orf58   15.7701032
## 3   C11orf24   11.2992141
## 4   KIAA1468    1.6562627
## 5   KIAA0100    0.8431068
## 6   C11orf31    9.9926117
# write it to a file named-'human_experimentPSEA.txt' in 'result' directory of SRBR_SMTSAworkshop-master
write.table(outD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/human_experimentPSEA.txt", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
  • Go to ‘result’ directory of SRBR_SMTSAworkshop-master and you will find the output file ‘human_experimentPSEA.txt’.

4.3. Do PSEA analysis with ‘PSEA1.1_VectorGraphics.jar’

  • Go to ‘PSEA-master’ directory and open ‘PSEA1.1_VectorGraphics.jar’

  • Click ‘Browse’ button under ‘STEP 1 Select items file:’ to upload the input file (‘human_experimentPSEA.txt’ in ‘result’ directory of ‘SRBR_SMTSAworkshop-master’) as shown below

  • Set corresponding parameters under ‘STEP 3 Select parameters:’ according to the sepecific study (here we use default values, except set q value < 0.1) as shown below

  • Click ‘Browse’ button under ‘STEP 4 Select output folder:’ to select the directory used to store output files (select ‘PSEAout’ directory within ‘result’ directory of ‘SRBR_SMTSAworkshop-master’) as shown below

  • Click ‘START’ button under ‘STEP 5 Generate output:’ to generate output files and ‘START’ button will change to ‘DONE’ button as shown below

  • Go to ‘PSEAout’ directory in ‘result’ directory and you will find three output files (‘vsUniform.svg’, ‘vsBackground.svg’ and ‘results.txt’) and two output directories (‘vsUniform’ and ‘vsBackground’). Open ‘vsUniform.svg’, you will see all phase enriched items with significant q value, look like below

  • Go to ‘vsUniform’ directory within ‘PSEAout’ directory and you will find gene list corresponding to each enriched item shown on the ‘vsUniform.svg’. There are also figures showing the phase distribution of listed genes for each enriched item, look like below

5. Try your data or share your experience of analyzing time-series data